Clustering of unevenly sampled gene expression time-series data

نویسندگان

  • Carla S. Möller-Levet
  • Frank Klawonn
  • Kwang-Hyun Cho
  • Hujun Yin
  • Olaf Wolkenhauer
چکیده

Time course measurements are becoming a common type of experiment in the use of microrarrays. The temporal order of the data and the varying length of sampling intervals are important and should be considered in clustering time-series. However, the shortness of gene expression time-series data limits the use of conventional statistical models and techniques for time-series analysis. To address this problem, this paper proposes the Fuzzy Short Time-Series (FSTS) clustering algorithm, which clusters profiles based on the similarity of their relative change of expression level and the corresponding temporal information. One of the major advantages of fuzzy clustering is that genes can belong to more than one group, revealing distinctive features of each gene’s function and regulation. Several examples are provided to illustrate the performance of the proposed algorithm. In addition, we present the validation of the algorithm by clustering the genes which define the model profiles in Chu et al (1998, Science, vol. 284, pp. 699-705). The fuzzy c-means, k-means, average linkage hierarchical algorithm and random clustering are compared to the proposed FSTS algorithm. The performance is evaluated with a well-established cluster validity measure proving that the FSTS algorithm has a better performance than the compared algorithms in clustering similar rates of change of expression in successive unevenly distributed time points. Moreover, the FSTS algorithm was able to cluster in a biologically meaningful way the genes defining the model profiles.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points

This paper proposes a new clustering algorithm in the fuzzy-c-means family, which is designed to cluster time series and is particularly suited for short time series and those with unevenly spaced sampling points. Short time series, which do not allow a conventional statistical model, and unevenly sampled time series appear in many practical situations. The algorithm developed here is motivated...

متن کامل

Representing Unevenly-Spaced Time Series Data for Visualization and Interactive Exploration

Visualizing time series data is useful to support discovery of relations and patterns in financial, genomic, medical and other applications. In most time series, measurements are equally spaced over time. This paper discusses the challenges for unevenly-spaced time series data and presents four methods to represent them: sampled events, aggregated sampled events, event index and interleaved eve...

متن کامل

Representing Unevenly - Spaced Time Series Data for Visualization and Interactive Exploration ( 2005 )

Visualizing time series data is useful to support discovery of relations and patterns in financial, genomic, medical and other applications. In most time series, measurements are equally spaced over time. This paper discusses the challenges for unevenly-spaced time series data and presents four methods to represent them: sampled events, aggregated sampled events, event index and interleaved eve...

متن کامل

Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms

MOTIVATION Periodic patterns in time series resulting from biological experiments are of great interest. The commonly used Fast Fourier Transform (FFT) algorithm is applicable only when data are evenly spaced and when no values are missing, which is not always the case in high-throughput measurements. The choice of statistic to evaluate the significance of the periodic patterns for unevenly spa...

متن کامل

H∞ Sampled-Data Controller Design for Stochastic Genetic Regulatory Networks

Artificially regulating gene expression is an important step in developing new treatment for system-level disease such as cancer. In this paper, we propose a method to regulate gene expression based on sampled-data measurements of gene products concentrations. Inherent noisy behaviour of Gene regulatory networks are modeled with stochastic nonlinear differential equation. To synthesize feed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Fuzzy Sets and Systems

دوره 152  شماره 

صفحات  -

تاریخ انتشار 2005